Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(storage): inverted index support calculate the score #16609

Merged
merged 4 commits into from
Oct 18, 2024

Conversation

b41sh
Copy link
Member

@b41sh b41sh commented Oct 15, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • Inverted index support calculate the score
  • Restore the inverted index test that was commented out

continue #16589

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@b41sh b41sh requested a review from sundy-li October 15, 2024 08:20
@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Oct 15, 2024
Copy link

what-the-diff bot commented Oct 15, 2024

PR Summary

  • Improved Test Variables in pruning.rs
    The test variables in pruning.rs were renamed from being prefixed with _ to their actual names (e13, e14, e15). This makes the code clearer and easier to understand.

  • Activated Test Assertions
    The test assertions for the variables e13, e14, and e15 were activated to ensure they function as expected.

  • Public Access Granted to TermReader in lib.rs
    The TermReader was added to the pub use list in lib.rs, meaning we've made it publicly accessible and ready for use.

  • Added New Dependency tantivy-common
    The tantivy-common dependency was added to our project. This new software element enhances our project's functionality.

  • Enhanced Field Normalization Handling
    By using FieldNormReader from tantivy in inverted_index_reader.rs, the handling of field normalization has been greatly improved.

  • Upgraded Term ID Tracking
    We've introduced field_term_ids. This new feature allows for better tracking of term ids for each field during term info collection.

  • Improved Index Slice Logic
    We've updated the way the index slice is created. It now handles score calculations and introduces new structures for managing field norms and total number tokens.

  • Optimized Document Collection Process
    The DocIdsCollector instantiation now uses TermReader, which improves performance and clarity. Moreover, we've adjusted the scoring calculations, allowing for better score management.

  • Enhanced Field Norms Reading
    In inverted_index_writer.rs, the reading of field norms has been streamlined, ensuring correct extraction from the segment file. This will make our data more consistent and reliable.

@b41sh b41sh added this pull request to the merge queue Oct 18, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 18, 2024
@b41sh b41sh added this pull request to the merge queue Oct 18, 2024
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Oct 18, 2024
@BohuTANG BohuTANG merged commit 5ec5c02 into databendlabs:main Oct 18, 2024
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants